Design  of  Experiments  for 
Information  Technology  Systems 


What  Program  Managers  Should  Know 
About  the  Plan  and  Design  Phases 


Rachel  T.  Silvestrini,  Ph.D.  ■  Maj.  William  J.  Parker  III  ■  Ginger  Sammito 


2t  mandates  require  that 
ous  statistical  and  math- 
tical  approaches  be  ap¬ 
plied  to  all  tests  that  fall  under 
developmental  and  operational 
test  and  evaluation  (T&E).  On 
October  19,  2010,  J.  Michael 
Gilmore,  director  of  Operational 
Test  and  Evaluation,  released  a 
memorandum  to  the  T&E  com¬ 
munity  within  the  DoD  that  de¬ 
scribes  an  initiative  designed 
to  increase  the  use  of  scientific 


Silvestrini  is  an  assistant  professor  in  the  Operations  Research  Department  at  the  Naval 
Postgraduate  School.  Parker  is  a  C4ISR  systems  operational  test  director  and  operations 
research  system  analyst  for  the  Homeland  Security/Information  Assurance  Portfolio  at  the 
Joint  Interoperability  Test  Command.  Sammito  is  a  principal  operations  research  system 
analyst  for  the  Force  Application/Force  Protection  Portfolio  at  the  Joint  Interoperability  Test 
Command. 


31 


Defense  AT&L:  September-October  2012 


The  use  of  Doe  ensures'that  the^J 
expenment  is  pla^^d  lif^ucn^wayjnat 
minimizes  the  resources  spent,  wniie^ 
maximizing  the  information  obtained.^ 


and  statistical  methods  to  develop  rigorous  methods  for  test 
and  data  analysis.  Dr.  Gilmore's  memo  specifies  the  need  for 
using  rigorous  statistical  based  testing  methods  in  order  to  en¬ 
sure  that  proper  and  sufficient  data  is  collected  to  answer  the 
question  of  interest.  In  addition,  Edward  R.  Greer,  the  director 
of  Developmental  Test  and  Evaluation,  has  championed  the 
skillsets  of  design  of  experiments  (DoE),  statistics,  and  test  de¬ 
sign  principles  in  the  rejuvenation  and  development  of  the  T&E 
workforce  as  one  of  his  top  initiatives  to  the  practice  of  T&E. 

The  framework  that  encompasses  the  statistical  and  math¬ 
ematical  approaches  for  T&E  Is  called  scientific  based  test 
design  (SBTD).  SBTD  can  be  applied  to  all  fields  and  applica¬ 
tion  areas  within  the  T&E  realm.  There  Is  no  set  of  T&E  experi¬ 
ments  in  which  SBTD  does  not  apply.  For  example,  consider 
the  program  manager  (PM)  who  is  involved  with  IT  systems 
and  feels  that  SBTD  cannot  be  applied  to  his/her  respective 
system  because  the  variable  measures  of  interest  in  the  ex¬ 
periment  results  in  a  binary  outcome.  In  other  words,  did  the 
system  work  (yes  or  no)?  Although  this  is  a  formidable  chal¬ 
lenge  that  must  be  considered  prior  to  running  the  experiment, 
it  is  not  a  showstopper. 

SBTD  is  a  framework  that  includes  statistical  based  methods 
for  T&E  such  as  DoE  and  regression  analysis.  DoE  is  a  for¬ 
mal  approach  for  the  development  of  a  set  of  tests 
to  be  carried  out  in  an  experiment.  An  experiment  is 
a  large  number  of  Individual  tests  (also  called  trials 
or  runs)  where  variables  are  manipulated  and  data 
Is  collected. 

There  are  abundant  sources  of  literature  on  DoE  that 
describe  the  mathematical  and  statistical  based  tac¬ 
tics  for  designing  and  analyzing  the  results  of  an  ex¬ 
periment  that  can  meet  the  needs  of  any  experimen¬ 
tal  goals.  These  methods  ensure  that  valid,  objective, 
and  scientific  conclusions  are  reached.  Additionally, 
the  use  of  DoE  ensures  that  the  experiment  is  planned 
in  such  a  way  that  minimizes  the  resources  spent, 
while  maximizing  the  information  obtained.  Figure  1 
highlights  the  four  phases  of  the  DoE  approach:  Plan, 

Design,  Execute,  and  Analyze. 

Unlike  the  T&E  of  traditional  weapons  systems  such 
as  aircraft,  tanks,  artillery,  maritime  vessels,  etc., 
the  PM  involved  with  IT  systems  testing  may  expe¬ 


rience  slightly  different  challenges  associated  with  the  JSlE 
processes.  However,  the  phases  of  DoE  process  do  not  change 
for  anyone.  While  this  article  is  primarily  aimed  at  the  PM 
within  JSlE  of  IT  systems,  it  is  intended  to  be  beneficial  reading 
for  any  PM  involved  with  JSlE  in  the  DoD.  The  remainder  of 
this  article  will  briefly  cover  how  to  apply  the  first  two  phases 
of  DoE  through  an  example  application  to  an  IT  system.  When 
appropriate,  specific  challenges  one  might  encounter  will  be 
highlighted. 

Applying  Science  Based  Testing  Designs 

The  DoE  approach  to  the  experiments  conducted  during  the 
JSlE  process  is  displayed  in  Figure  1.  The  first  two  phases  of 
this  process  (Plan  and  Design)  will  be  discussed  through  an 
example  application  to  an  IT  system. 

Suppose  that  a  PM  is  in  charge  of  oversight  for  a  new  soft¬ 
ware  application  being  developed  as  a  test  tool.  The  experi¬ 
ment  used  to  test  the  software  is  called  Bravo  Test.  During 
Bravo  Test  different  message  types  for  multiple  platforms 
with  an  Identification  Friend  or  Foe  (IFF)  system  are  both 
transmitted  and  received.  A  DoD  architecture  framework 
Is  Illustrated  in  Figure  2.  Bravo  Test  will  take  place  at  the 
systems  level  (middle  view). 


Figure  1.  Design  of  Experiments  (DoE)  Process 


Defense  AT&L:  September-October  2012 


32 


Compliments:  46th  Test  Wing  OA 


Phase  1:  Plan 

The  first  phase  in  the  DoE  process  is  Plan.  This  phase  includes 
statement  of  the  goal  of  the  experiment  as  well  as  the  develop¬ 
ment  of  a  list  of  variables  involved  in  the  experiment.  There 
are  three  types  of  variables  important  to  list: 

■  variables  that  will  be  manipulated  or  controlled  during  the 
experiment 

■  variables  that  cannot  be  controlled,  but  may  change  dur¬ 
ing  the  experiment 

■  variables  used  to  measure  the  system  (outcomes) 

The  goal  of  Bravo  Test  is  to  test  the  accuracy  and  timeliness 
of  messages  transmitted  and  received.  The  first  objective  of 
Bravo  Test  is  to  determine  whether  or  not  each  of  four  differ¬ 
ent  platforms  transmits  or  receives  messages  with  accuracy 
rate  above  99  percent.  The  second  objective  is  to  model  the 
expected  time  to  transmit  and  receive  a  message  as  a  func¬ 
tion  of  the  different  platforms,  identification  systems,  and  type 
of  message.  The  PM  should  be  aware  that  the  recognition  of 
the  goal  and  objectives  in  a  test  often  aid  in  identifying  the 
variables  present  In  the  experiment. 

Table  1  illustrates  the  three  different  controllable  variables  that 
will  be  manipulated  (changed)  over  the  course  of  Bravo  Test. 
Remember;  variables  that  can  be  controlled  as  well  as  those 
that  cannot  be  controlled  should  be  identified.  For  example, 
during  Bravo  Test  the  average  system  load  during  the  trans¬ 
mission  of  a  message  may  be  measurable,  but  it  may  not  be  a 
variable  that  is  directly  controllable.  The  PM  should  be  eager 
to  identify  all  uncontrollable  variables  possible  and  additionally 
keep  in  mind  that  it  is  possible  that  a  few  variables  may  not 
be  known  initially,  but  will  emerge  later.  This  should  not  be  a 


stumbling  point,  but  an  opportunity  for  the  PM  to  refine  the 
test  during  the  next  cycle  with  more  information.  This  involves 
going  back  to  the  planning  phase  and  proceeding  from  there. 

Example  Factors  to  be  varied 
during  Bravo  Test 


Controllable  Variables 

Settings  During  Test 

IFF  (Identification,  Friend, 
or  Foe) 

Range  0  -  5 

Message  types 

UTF-8,  UTF-16,  UTF-32  (UTE  = 
Unicode  Transformation  Format) 

Producing  or  Consuming 
Platforms 

A,  B,  C,  D 

In  Bravo  Test,  there  are  two  outcome  variables:  (1)  accuracy 
of  message  and  (2)  time  to  transmit/receive  message.  Ac¬ 
curacy  is  a  binary  variable:  if  the  message  is  100  percent  cor¬ 
rect,  the  data  point  will  be  considered  1  (accurate);  otherwise 
0  (not  accurate).  In  IT  systems  testing,  a  binary  response  is 
a  common  metric  of  interest.  Also,  many  outcome  variables 
may  be  collected  for  a  single  test  within  the  experiment;  this 
is  important  to  note  and  is  used  when  assessing  the  quantity 
of  tests  required  for  the  experiment. 

Without  proper  care  in  the  Plan  phase  of  the  experiment,  the 
direction  of  the  experiment  may  become  unclear.  This  leads  to 
the  collection  of  erroneous  or  incomplete  information,  which 
will  prevent  the  experimental  goals  from  being  met.  Often, 
determining  the  variables  of  interest  In  an  experiment  can  be 
a  difficult  task  that  should  be  undertaken  with  caution.  Fish¬ 
bone  diagrams  as  well  as  other  brainstorming  techniques  often 
work  well  during  subject  matter  expert  meetings  to  discuss 
variable  selection. 


Fi^re  2.  DoD  Architecture  Framework 
with  Systems  View  in  Center 


The  Operational  View  describes 
and  interrelates  the  operational 
elements,  tasks  and  activities,  and 
information  flows  required  to 
accomplish  mission  operations. 


The  Systems  View  describes  and 
interrelates  the  existing  or  postu¬ 
lated  technologies,  systems,  and 
other  resources  intended  to  support 
the  operational  requirements. 


The  Technical  View  describes  the 
profile  of  rules,  standards,  and 
conventions  governing  systems 
implementation  and  forecasts  their 
future  direction. 


DoD  Architectural  Framework  (DoDAF) 


Phase  2:  Design 

The  Design  Phase  involves  map¬ 
ping  out  the  sets  of  tests  that  will 
be  conducted  during  the  experi¬ 
ment.  Specifically,  this  phase  in¬ 
volves  the  selection  of  the  design 
type  and  the  determination  of  the 
number  of  tests  to  be  conducted 
in  the  experiment  (also  known  as 
sample  size).  Each  test  involves 
the  control  and  manipulation  of 
variables  identified  in  the  Plan 
Phase.  There  are  a  number  of 
different  experimental  design 
techniques  found  in  various  text¬ 
books,  journal  articles,  technical 
reports,  and  case  studies. 

Examples  of  design  selections 
include  factorial  design,  frac¬ 
tional  factorial  design,  central 
composite  design,  covering 
array,  and  optimal  design.  While 


33 


Defense  AT&L:  September-October  2012 


Figure  3.  JMP — User  Interface  for  the 
Development  of  Full  Factories  Design 


ftk:  Tg^s-  Jj|clp 

*  Fyll  F«»rl4l  0»4^n 
4  RHCh^mMt 

W^iqWWW'  I  tpnghil  .  I  Ni^nib#i  fl*  Bbi  I  Ml,  I 

_ ^B*1I  IZ'fftrLmt  ycptfjjrj  NwrUnc* 

I?  h  _ 

_ li _ [£ _ 


J  Fac^orv 


€M  ^ 

Hinw 

VMiei 

^FF 

0  It 

1  l3 

Jt  u 

mr-s 

yTT,s& 

^HtcrisrTn 

A  ie 

_ 

[2 

OecKVii 


JiuhlDH  er  Rurit,  HA 

hhjrfw  «r  CMbH  Pc4!i!i. 

Hunw  3 

Juijp*  Ubm  • 


a  PM  does  not  necessarily  need  to  know 
each  different  design,  they  should  recognize 
that  different  designs  are  appropriate  for 
different  experimental  goals.  For  example,  a 
fractional  factorial  design  is  an  appropriate 
design  choice  when  the  experimental  goal 
involves  finding  the  subset  of  factors  that 
influence  the  outcome  variable  of  interest. 
This  Is  a  goal  typically  encountered  in  the 
early  phase  of  testing.  For  situations  involv¬ 
ing  multiple  responses  with  overlapping  or 
conflicting  goals,  a  hybrid  design  approach, 
in  which  different  design  choices  are  com¬ 
bined,  can  be  used  to  satisfy  all  objectives 
of  the  experiment. 

In  addition  to  design  choice,  the  number  of 
tests  to  run  (or  the  sample  size)  of  an  experi¬ 
ment  must  be  determined  duringthis  phase. 
Given  the  opportunity,  a  PM  might  prefer  to 
choose  an  unlimited  sample  size.  However, 
cost,  time,  and  resource  constraints  often 
drive  sample  size  choices. 


For  Bravo  Test,  a  full  factorial  design  with 
four  replicates  Is  selected  to  support  the 
goals  of  testing  the  accuracy  and  timeliness 


Figure  4.  JMP — Full  Factorial  Table  Design 


of  messages  transmitted  and  received.  A 
statistical  software  package,  such  as  JMP 
(illustrated),  can  be  used  to  create  the  de¬ 
sign.  Snapshots  of  the  design  creation  are 
shown  in  Figure  3  and  Figure  4.  Figure  3  il¬ 
lustrates  the  user  interface  that  guides  the 
inputs  to  the  development  of  the  design. 
Figure  4  contains  the  design.  The  design 
dictates  the  running  of  every  experimental 
test.  For  example,  the  first  experimental 
test  will  be  conducted  with  IFF  =  2,  Mes¬ 
sage  Type  =  UTF-16,  and  Platform  =  D. 

A  full  factorial  design  is  appropriate  for 
the  needs  of  Bravo  Test.  In  Bravo  Test, 
simple  relationships  between  IFF,  Mes¬ 
sage  Type  and  Platform  will  be  inves¬ 
tigated.  In  other  situations,  different 
designs  may  be  more  apt.  The  factorial 
design  dictates  a  baseline  number  of  runs 
in  the  experiment.  That  number  can  be 
altered  by  repetition  of  the  experiment 
(as  seen  in  one  of  the  selection  tabs  in 
Figure  3).  It  Is  important  for  the  PM  to 
realize  that  within  a  resource-constrained 
environment,  a  single  experiment  cannot 
provide  unlimited  answers.  Both  design 
choice  and  sample  size  restrictions  trans¬ 
late  to  restrictions  on  what  information 
can  be  obtained.  Statistical  and  math- 


Defense  AT&L:  September-October  2012 


34 


ematical  analysis  can  greatly  help  overcome  sample  size 
dilemma  by  focusing  on  answering  the  following: 

■  Given  a  fixed  sample  size,  what  information  can  be  mea¬ 
sured  and  modeled? 

■  Given  measurement  or  modeling  requirements,  what 
sample  size  is  required? 

Approach  (1)  involves  identifying  risks  in  the  constrained  envi¬ 
ronment  and  approach  (2)  involves  determining  requirements 
of  sample  size  based  on  the  risks  the  experimenter  is  will¬ 
ing  to  accept.  Risks  can  be  discussed  in  terms  of  confidence 
level  and/or  power  of  mathematical  estimation.  These  are 
two  terms  related  to  statistical  analysis  that  PMs  should  be 
or  become  familiar  with. 

During  the  Design  Phase,  the  PM  should  encourage  documen¬ 
tation  of  the  methodology  that  includes  rationale  for  selecting 
a  design,  sample  size,  and  lessons  learned  from  the  process. 
Clear  documentation  will  help  the  PM  face  the  challenges  of 
the  iterative  DoE  process  and  development  stages  as  the  soft¬ 
ware  moves  towards  maturity. 

Conclusion 

SBTD  methods,  specifically  DoE,  can  and  should  be  applied 
to  T&E  of  IT  systems.  There  are  many  case  studies  that  docu¬ 
ment  the  success  of  the  DoE  approach  for  both  IT  and  non- 
IT  systems.  This  article  covered  the  Plan  and  Design  phases 
in  the  DoE  approach.  It  Is  believed  that  the  Plan  and  Design 


phases  are  of  utmost  importance  because  an  inadequately 
designed  experiment  will  result  in  poor  results  and  possibly 
incorrect  conclusions,  thus  making  the  Execute  and  Analyze 
phases  meaningless. 

The  Execute  Phase  refers  to  the  running  of  each  test  in  the 
experiment.  For  Bravo  Test,  the  experiment  to  be  run  is  illus¬ 
trated  in  Figure  4.  During  this  phase,  it  is  imperative  that  each 
test  is  run  to  specification.  This  involves  ensuring  that  proper 
blocking,  randomization,  and  replication  are  carried  out  as 
specified  by  the  design.  The  Analyze  Phase  encompasses  a 
mathematical  study  of  the  resulting  data  to  obtain  valid  and 
objective  conclusions. 

Sometimes  the  challenges  and  decisions  in  the  creation  of 
an  experimental  design  approach  appear  endless  for  the  PM, 
especially  as  requirements  shift  from  traditional  testing  to  rig¬ 
orous  SBTD  for  IT  systems.  The  PM  must  ensure  compliance 
with  applicable  policies.  The  PM  is  also  responsible  for  the 
quality  and  consistency  to  those  standards  while  developing 
test  reports  based  on  a  sound,  scientific  rigor  that  have  not 
formally  been  a  part  of  any  IT  system/program.  The  PM  needs 
to  look  beyond  the  present  in  facing  these  SBTD  challenges 
in  IT  systems  and  focus  on  the  valid,  objective,  and  measure- 
able  approach  that  ultimately  saves  time  and  money  over  the 
development  cycle  of  the  IT  system. 

The  authors  can  be  reached  at  rtsilves^nps.edu,  wniiam.j.parker60@ 
mail. mil,  and  gmger.j.sammito.civ^mail.mn. 


DAU  Alumni  Association 

JO/N  THE  SUCCESS  NETWORK 


The  DAU  Alumni  Association  opens  the  door  to  a  worldwide  network  of  Defense 
Acquisition  University  graduates,  faculty,  staff  members,  and  defense  industry 
representatives— all  ready  to  share  their  expertise  with  you  and  benefit  from  yours. 

Be  part  of  a  two-way  exchange  of  information  with  other  acquisition 
professionals. 

■  Stay  connected  to  DAU  and  link  to  other  professional  organizations. 

■  Keep  up  to  date  on  evolving  defense  acquisition  policies  and  developments 
through  DAUAA  newsletters  and  symposium  papers. 

■  Attend  the  DAUAA  Annual  Acquisition  Community  Conference/  Symposium 
and  earn  Continuous  Learning  Points  (CLPs)  toward  DoD  continuing  education 
requirements. 

Membership  is  open  to  all  DAU  graduates,  faculty,  staff,  and  defense  industry 
members.  It's  easy  to  join,  right  from  the  DAUAA  website  at  www.dauaa.org. 

For  more  information^ 

caW  703-960-6802  or  800-755-8805,  or  e-mail  douoo2(ot)ooLcom. 


35 


Defense  AT&L:  September-October  2012 


